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In this paper, after a discussion of general properties of statistical tests, we present 
the construction of the most powerful hypothesis test for determining the existence of 
a new phenomenon in counting-type experiments where the observed Poisson process 
is subject to a Poisson distributed background with unknown mean. 
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I. INTRODUCTION 

Typical experiments which search for new phenomena such as rare decays (see, for exam- 
ple yj), new particles (see, for example 2]) and astronomical gamma-ray and X-ray sources 
(see, for example 0!0|) counting-type experiments. In such experiments the number of 
observed events is distributed according to a Poisson distribution with some average rate. 
Unfortunately, often such experiments are subject to unwanted background, i.e., even if the 
new phenomenon is not present, the experiment will register some number of counts with 
average background rate. Only in rare cases is the expected background rate known. There- 
fore, to overcome this difficulty, these experiments typically utilize one of several available 
techniques. One of the possibilities is to perform two observations — one for which some of 
the observed counts are believed to originate from the new phenomenon and the other for 
which all observed counts are known to originate due to background only; all other condi- 
tions of the observations are kept intact. Thus, the two observations will yield two observed 
counts rii and n2 made during observation times ti and t2 respectively. The number of 
events ni 2 in each observation is drawn from the corresponding parent Poisson distribution. 
If the new phenomenon exists, the observations will come from the Poisson distributions 
with different average event rates. If, on contrary, the new phenomenon does not exist, the 
observations will come from the Poisson distributions with identical average event rates. 
When it is not possible to obtain data due to background only or to otherwise determine 
the average expected background rate another approach is often used: the first observation 
is made as before, but the second one is made with the help of computer simulations of 
exactly the same experiment with the new phenomenon "turned off". In other words, the 
observation rii during time ti is obtained as in the previous case. The second observation 
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FIG. 1: Illustration of a critical region with corresponding errors of the first and second kind. 

n2 during time ^2 is obtained by simulating the experiment using only the established laws 
of physics. Since it is believed that the computer simulation correctly describes the data 
collection procedure, n2 can be regarded as drawn from the Poisson distribution with aver- 
age background rate. In either case, a decision as to the plausibility of the existence of the 
phenomenon is made based on the outcomes of the two observations. Because the outcomes 
of the observations represent random numbers drawn from their respective parent distribu- 
tions, the question of existence of the new phenomenon is here addressed by a hypothesis 
test. 

Various statistical tests have been developed which address the question of testing the 
hypothesis that two independent observations rii and n2 made during times ti and ^2 are 
due to common background only. The methods, an overview of which can be found in j^, 
mostly use Gaussian-type approximations to the Poisson distribution and are not reliable 
for small numbers of observed events. In this paper we present the construction of the most 
powerful hypothesis test for this situation. That is, we calculate the critical region to be 
used which, for any given probability of claiming consistency with background fluctuation 
(typically a number such as 10"'^ or less), maximizes the probability of detecting a signal. 

II. TESTING A STATISTICAL HYPOTHESIS 

We begin by reviewing the general procedure of hypotheses tests outlined in 4]. A 
statistical hypothesis is a statement concerning the distribution of a random variable Y} A 
hypothesis test is a rule for accepting or rejecting the hypothesis based on the outcome of 
an experiment. The hypothesis being tested, called the null hypothesis Hq, is formulated in 
such a way that all prior knowledge strongly supports it. The hypothesis is rejected if the 
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observed value y of the random variable Y lies within a certain critical region w of the space 
W of all possible outcomes of Y and accepted or doubted otherwise. It follows, then, that if 
there are two tests for the same hypothesis Hq, the difference between them consists in the 
difference in critical regions. It also follows that Hq can be rejected when, in fact, it is true 
[error of the first kind)] or it can be accepted when some other alternative hypothesis H\ is 
true {error of the second kind). The existence of an alternative hypothesis is clear, otherwise 
the null hypothesis would not be questioned. The probabilities of errors of these two kinds 
depend on the choice of the critical region. The definitions are illustrated in figure ^ Each 
probability of occurrence of every event y is given a subscript corresponding to its 
progenitor hypothesis, such as Po(l/) for -^o- The region to the right of yc is selected to be 
the critical one and the two types of errors of the test are marked with different hatching. 

A critical region is said to be the best critical region for testing hypothesis Hq with regard 
to Hi if it is the one which minimizes the probability of the error of the second kind (to accept 
Hq when Hi is true) among all regions which give the same fixed value of the probability of 
the error of the first kind (to reject Hq when it is true). The construction of the best critical 
region, resulting in the most powerful test of Hq with regard to Hi was considered in |^ 
where the problem is solved for the general case of simple hypotheses. A hypothesis is said 
to be simple if it completely specifies the probability of the outcome of the experiment; it 
is composite if the probability is given only up to some unspecified parameters. In general, 
if at least one of the hypotheses is composite, the best critical region may not exist 0|. 

Critical regions w{a) corresponding to different probabilities a of errors of the first kind 
are engineered before the test is performed. When experimental data y is obtained, the 
smallest a is found such that y G w{a). It is then said, that the observed experimental data 
can be characterized by the p- value equal to a. 

The maximum p-value at which the null hypothesis is rejected is called significance of 
the test and will be denoted as Oc- The corresponding critical region will be denoted as 
Wc = w{ac)- The significance etc is set in advance, before the test is performed and its choice 
is based on the penalty for making the error of the first kind. (False scientific discoveries 
should not happen very often, and thus the significance is often selected as ac = 10~^.) One 
minus the probability of the error of the second kind is called the power of the test which 
we denote as {1 — (3). 

If as the result of the experiment the observed data lies inside of the critical region Wc, it is 
concluded that the null hypothesis is rejected in favor of the alternative one with significance 
Q!c and power (1 — /5). If, however, the observed data lies outside of the critical region Wc, 
it is concluded that the null hypothesis is not rejected in favor of the alternative one with 
significance and power (1 — /3). 

Special consideration must be given to the case of a composite null hypothesis Hq of 
the form po{y;{\}) with unknown parameters {A}. Indeed, suppose a critical region w is 
specified. Then, it is possible to perform the test: if the obtained value of the observable 
Y is inside the critical region w, the null hypothesis is rejected, if y ^ w, it is accepted. 
However, the probability of the error of the first kind a{{\}) = f^poiij] {\})dy in general 
depends on unknown values of parameters {A} of the null hypothesis. Thus, the p-value a 
can not be assigned and the conclusion of the test can not be stated. It is therefore desired 
to construct such critical regions w, that the probability of the error of the first kind does 
not depend on the values of unknown parameters. Such regions are called similar to W with 
regard to parameters {A}. A method for construction of such regions was found in ^ under 
limited conditions which in the case of one parameter A are: 
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• the probability distribution po{y] A) is infinitely different iable with respect to A and 

• the probability distribution po{y', A) is such that if $(?/) = , then 

+ (1) 

where the coefficients A and B are functions of A, but are independent of observations 

y- 

If the above conditions are satisfied, critical regions w similar to W with regard to A are 
built up of pieces of the hypersurfaces $ = const defined by the likehhood ratio pi/po > q. 

III. NULL HYPOTHESIS BEING TESTED 

As was pointed out above, in a typical counting-type experiment two independent obser- 
vations yielding two counts ni and n2 are made during time periods ti and t2 respectively 
with all other conditions being equal. Because it is assumed that each event carries no in- 
formation about another, each of the observed counts can be regarded as being drawn from 
a Poisson distribution with some value of the parameter. In as much as an attempt is being 
made to establish the existence of a new phenomenon, the null hypothesis Hq is formulated 
as: Til and n2 constitute an independent sample of size 2 from a single Poisson distribution 
(adjusted for the duration of observation) which is due to a common background with some 
unknown count rate A, or 

TT-l! 77,2' 

The alternative hypothesis is that the two observations are due to Poisson distributions 
with different unknown count rates Ai and A2 respectively (Ai 7^ A2): 

rii! 712! 

The usual physical situation is that one of the count rates is considered to have some 
amount due to the new process and the remainder due to the background process [e.g. 
Ai = (A -|- Xsignai) ^ud A2 = A], thus the formulated hypothesis test matches the physical 
problem given in the introduction. 

It is seen that for the case of interest both hypotheses are of composite type (A's are 
unspecified) which complicates the construction of the test. 

IV. THE MOST POWERFUL TEST 

The formulated probability distribution poirii, ^2) satisfies the conditions of a special case 
considered in 0], which facilitates the search for the best critical region in (ni,n2) space. 
The probability distribution Po(^i)'"'2) satisfies the following conditions: 

• Poijii,n2) is infinitely differentiable with respect to A, 
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• the function $(ni,n2) defined as $(ni,n2) = ^^^^ satisfies equation ([T]) with 

Hn,,n2) = - (ti + t2),A = and B = 

Therefore, the best critical region corresponding to the error of the first kind a, deter- 
mined from $ = const, is buih up of pieces of the hues nt = ni + n2 = const. The segments 
of each hne are those for which the ratio of hkehhoods pi/po is greater than some constant 
Qa- This translates to: 

which can be written as: 

ni > n„, Ai > A2 (2) 

ni < na, Ai < A2 (3) 

where the critical value n^ is chosen to satisfy the desired probability of the error of the 
first kind a: 



nt nt 



« H Po{k, nt-k)= po{k, nt-k), Ai > A2 

fc=0 k=na 



nt 



a^Poik^nt- k) = Ypo{k,nt- k), Ai < A2 

fc=0 k=0 

Substituting explicitly the expression for Po{ni, 77-2) in to these equations, we obtain 

nt 

a = (1 + 7)-"' Ct.j" = I^{n^,nt-n^ + l), X, > X2 (4) 

na 

« = (1 + 7)-^^^ Cut 7' = I^int - n„ + 1), Ai < A2 (5) 

where 7 = ^1/^2 > 0, = ^\(^n-my. binomial coefficients and /^(a, b) is the normalized 
incomplete beta function. It must be emphasized that the critical value n^ depends on the 
parameters Ai^2 of the alternative hypothesis only via the relation Ai < A2 or Ai > A2. The 
best critical region for testing the null hypothesis against the alternative with Ai 7^ A2 does 
not exist, but it does exist for testing against Ai > A2 or Ai < A2 separately, that is when 
the signal is a source or a sink respectively. The equations (jl} with (j2I) or © with (jH} 
define the best critical region w{a) in the space (ni,?T,2) for testing Hq with regard to Hi 
(defined for Ai < A2 and Ai > A2 separately) corresponding to the probability of the error 
of the fist kind a. The boundary of this critical region is found by solving the equation Q 
or (0) with respect to na for all possible values of nt. Owing to the discrete nature of the 
observed number of events these equations might not have solutions for the specified level 
of significance ttc- Nevertheless, it is possible to construct a conservative critical region such 
that the probability to observe data within the region does not exceed the preset level etc if 
the null hypothesis is true. This is done by requiring: 
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etc < [ria - 1, ni - n„ + 2) 

1+7 



dc > {nt - Ua, Ua + 1) 

1+7 A < A 

a^. < I 1 (m — Ur, — 1. + 2) ""^ ^ 

1 + 7 



The power (1 — /3) of the test will, of course, depend on the values of the parameters of 
the alternative hypothesis: 



oo rit 



nt=0 k= 



nt=0 k=0 

After explicit substitution of (721,^2) into these equations, we obtain: 



;i-/3)= E ^^i^^^^^e-(^^*^+^^*^)/^^(n„,n,-n. + l), Ai > A^ (6) 



^ (Ml±^aM!:e-(^^*^+^^*^)/^^K-n„,n„ + l), A^ < A^ (7) 



For the purposes of the hypothesis test itself, equations ()4p5|l provide the method for the 
p- value calculation without the need for solving them. To do this, rij must be set to (rii + 77-2) 
and Ha to Hi then a is computed from the equation (jH) or (0). If the obtained p- value a is 
not greater than the null hypothesis is rejected. This is the uniformly most powerful test 
of Hq with regard to Hi. 

It can be seen that the application of the method of best critical region construction to 
the problem of testing whether two observations came from the Poisson distributions with 
the same parameter or not have led us to the criterion suggested on intuitive grounds in Jij . 
The presented discussion, however, shows that it is not possible to construct a better test 
for the hypotheses under consideration. The practical use of equations (|4|5|6|7|) should not 
present any difficulty using modern computers 



V. COMPOUNDING RESULTS OF INDEPENDENT TESTS 

It is often the case that the complete data set consists of several runs of the experiment, 
each of which belongs to the counting type subjected to Poisson distributed background with 
unknown means. The data set is then a set of pairs {ni^r, '^2,r) with corresponding durations 
of observations (ti^r, ^2,r); where subscript r enumerates all the runs of the experiment. Here 
we distinguish two cases: first where the parameters of both hypotheses do not depend 
on the run number r and second where such independence can not be asserted because of 
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modifications made to the apparatus between the runs. In the first case it can be seen that 
the critical region must be constructed out of pieces of surfaces rit = Z)r(^i,r + ^2,r) = const, 
on which Z^^^^i.r > ""-a for the case of Ai > A2 or I^^^i.r < n-a for the case of Ai < A2. 
It is thus seen that equations ()4I5|) provide a method for the p-value calculation: rit = 

Srl'^ir + n2r), = J2r ^1 r sjid 7 = . In othcr words, if the parameters of both 

hypotheses do not depend on the run number r, the corresponding observations can simply 
be added. 

In the second case, the derivation proceeds in the fashion similar to the presented deriva- 
tion of equations ()4I5|) . the critical region is built up of surfaces rit^r = ni^r + '^2,r = const 
such that 

^ni,rlog(Ai,r/A2,r) > q'a 

r 

The critical value q'^ is chosen to satisfy the desired probability of the error of the first 
kind a: 

« = n(l + 7.)-"''^ E C'^l^r' 

i K e [0, nt^r] 

\ Er^rlog(Ai,,./A2,r) > 

The p- value is obtained by setting q'^ equal to I]r '^i.r log(Ai_r/A2,r)- In this case, the 
critical region depends on the parameters of the alternative hypothesis, but the test becomes 
uniformly most powerful if it is known that the ratios Ai^r-/A2,r do not depend on the run 
number r. The latter situation is common in practice because it reflects change of the level 
of pre-scaling of events in the data acquisition system or degradation of efficiency of sensors 
occurred between the experimental runs. 

VI. CONCLUSION 

In this paper we have reviewed the basic concepts of statistical hypothesis tests and 
underlined the relevant aspects often employed. The difficulty arises because frequently 
both the null and the alternative hypotheses are of composite type. 

We have considered typical counting experiments and constructed the most powerful sta- 
tistical test. In doing so, we have insisted on the ability to quantify the error of the first kind 
although the parameter of the composite null hypothesis is unknown. The test also happens 
to be the uniformly most powerful with regard to the composite alternative hypothesis with 
Ai < A2 or Ai > A2 separately. The constructed test is especially important for the case 
of small number of events where previously used methods are inadequate because the usual 
Gaussian-type approximations break down. Fortuitously, this is the case for which the pro- 
posed test can most easily be performed. The existence of the most powerful statistical test 
allows comparisons with other computationally less demanding methods to be made which 
may be important for some applications. 
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